{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 什么是梯度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "像平时学习的倒数，微分方程，其实都是对x轴或者是y轴求导，求曲线沿着某个轴线的变化速率。\n",
    "\n",
    "例如y=1/x .求导之后可以发现，当x越大，求导值越小，说明曲线的变化速率越小"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "什么是梯度呢？梯度是一个矢量，并不局限于x或者y轴。而是x轴和y轴方向的一个矢量和。梯度反应的是函数变化速率最快的方向。矢量的模越大，则函数在这个方向变化越快"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 怎么搜索最优解"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "最优解一般包括全局最优解和局部最优解。局部最优解一般很容易找到，全局最优解往往会遇到很多问题\n",
    "\n",
    "在二维图像上，搜索最优解。先随机初始化一个x。求出该点的斜率，用一个学习率乘以该斜率。x加表示求全局最大值。x减表示求全局最小值(因为梯度是函数增长最快的方向)\n",
    "\n",
    "在高维图形中，也是先初始化值。然后求出梯度，接下来和上面一样。x-func(x,y)偏x，y-func(x,y)偏y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 什么会影响你的搜索"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 初始化的值会影响搜索最优解。初始化的值不同，可能会取得不同的最优解\n",
    "- 学习率也会影响。学习率越大，可能效果越差。迭代次数会变多。甚至可能不收敛。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 激活函数"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "神经元有多个输入，将这些输入求和加上偏置bias后，作为激活函数的输入，然后得到一个输出。这就是激活函数的作用。\n",
    "\n",
    "第一个激活函数是根据动物得到的。在刺激动物的时候，神经元不会马上作出反应，而是达到一定阈值才会。所以提出使用阶梯函数作为激活函数。但是阶梯函数求导为0，因此不适合优化，所以提出了sigmoid函数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([-100.0000,  -77.7778,  -55.5556,  -33.3333,  -11.1111,   11.1111,\n",
       "          33.3333,   55.5556,   77.7778,  100.0000])"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = torch.linspace(-100,100,10)\n",
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([0.0000e+00, 1.6655e-34, 7.4564e-25, 3.3382e-15, 1.4945e-05, 9.9999e-01,\n",
       "        1.0000e+00, 1.0000e+00, 1.0000e+00, 1.0000e+00])"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "torch.sigmoid(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([-1., -1., -1., -1., -1.,  1.,  1.,  1.,  1.,  1.])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# tanh\n",
    "torch.tanh(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([  0.0000,   0.0000,   0.0000,   0.0000,   0.0000,  11.1111,  33.3333,\n",
       "         55.5556,  77.7778, 100.0000])"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# relu又叫合页函数\n",
    "# relu大于0的时候梯度不变保持为1,因此容易出现梯度爆炸\n",
    "# 优先使用这个函数作为深度学习的激活函数\n",
    "\n",
    "torch.relu(a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# loss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用pytorch自动求导"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "损失函数表现为预测值和实际值的差值平方和再求平均，mean suqure loss\n",
    "\n",
    "为了使损失函数最小，这个问题是一个凸优化问题。也涉及到求梯度等方法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在假设一个例子，实现求loss\n",
    "\n",
    "y = wx,y and x是知道的，我们初始化一个w\n",
    "\n",
    "mean_squre_loss = (y-wx)^2是一个关于w的函数 =(1-w)^2 我们知道loss最小w应该取1\n",
    "\n",
    "对w求导后,2(y-wx)*-x\n",
    "\n",
    "假设y=1,x=1,初始化w=2 ,2(1-w)*-1 = 2(w-1)\n",
    "\n",
    "假设w初始化为2，那么求导后结果为2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([1.])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = torch.ones(1)\n",
    "x # dim=1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "predict = torch.ones(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([2.])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "w = torch.full([1],2.)\n",
    "w"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.nn import functional as F"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "mes = F.mse_loss(predict, w*x) #第一个系数是预测值，第二个参数是表达式，y=后面这部分的/也是准确值标签，和预测值相对"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(1.)"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mes # 这里预测值是1，实际值是2，那么loss就是1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "loss = (predict-x*w)**2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "ename": "RuntimeError",
     "evalue": "element 0 of tensors does not require grad and does not have a grad_fn",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-22-e71b2267cf90>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mautograd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrad\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloss\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mw\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py\u001b[0m in \u001b[0;36mgrad\u001b[0;34m(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)\u001b[0m\n\u001b[1;32m    188\u001b[0m         \u001b[0mretain_graph\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    189\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 190\u001b[0;31m     return Variable._execution_engine.run_backward(\n\u001b[0m\u001b[1;32m    191\u001b[0m         \u001b[0moutputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgrad_outputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    192\u001b[0m         inputs, allow_unused)\n",
      "\u001b[0;31mRuntimeError\u001b[0m: element 0 of tensors does not require grad and does not have a grad_fn"
     ]
    }
   ],
   "source": [
    "# 会报错是因为没有设置跟踪w求导\n",
    "# pytorch是动态图，在设置的时候，需要手动设置哪些系数需要求导\n",
    "torch.autograd.grad(loss, w) # 第一个是方程，第二个是需要求导的系数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([2.], requires_grad=True)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 我们在这里设置追踪自动求导，因为后面要对w求导，也可以后面手动补全\n",
    "w.requires_grad_() \n",
    "# w = torch.full([1],2.,requires_grad_=True),也可以在创建的时候直接设置"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "ename": "RuntimeError",
     "evalue": "element 0 of tensors does not require grad and does not have a grad_fn",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mRuntimeError\u001b[0m                              Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-24-e71b2267cf90>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mautograd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrad\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mloss\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mw\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m~/miniconda3/lib/python3.8/site-packages/torch/autograd/__init__.py\u001b[0m in \u001b[0;36mgrad\u001b[0;34m(outputs, inputs, grad_outputs, retain_graph, create_graph, only_inputs, allow_unused)\u001b[0m\n\u001b[1;32m    188\u001b[0m         \u001b[0mretain_graph\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    189\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 190\u001b[0;31m     return Variable._execution_engine.run_backward(\n\u001b[0m\u001b[1;32m    191\u001b[0m         \u001b[0moutputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgrad_outputs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    192\u001b[0m         inputs, allow_unused)\n",
      "\u001b[0;31mRuntimeError\u001b[0m: element 0 of tensors does not require grad and does not have a grad_fn"
     ]
    }
   ],
   "source": [
    "torch.autograd.grad(loss, w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "loss = (predict-x*w)**2 # 必须要继续更新才能求导"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(tensor([2.]),)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "torch.autograd.grad(loss, w)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用backward求梯度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "除了使用torch.autograd计算梯度以外，还可以使用backward计算梯度\n",
    "\n",
    "请看https://www.cnblogs.com/marsggbo/p/11549631.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 前面我们求了损失函数\n",
    "\n",
    "mes = F.mse_loss(predict, w*x) #第一个系数是预测值，第二个参数是表达式，y=后面这部分的/也是准确值标签，和预测值相对"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor(1., grad_fn=<MeanBackward0>)"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mes # mes 是一个floattensor类型。我们可以使用这个类型的backward计算查看w的梯度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "mes.backward()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([2.])"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "w.grad # 查看此时的梯度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 总结两种方式求梯度"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**总结：两种方式求梯度**\n",
    "\n",
    "- torch.autograd.grad(loss, [w1,w2...])\n",
    "\n",
    "- loss.backward()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# softmax"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "https://blog.csdn.net/bitcarmanlee/article/details/82320853\n",
    "\n",
    "该元素的softmax值，就是该元素的指数与所有元素指数和的比值。所以数值大的会被放大。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([0.5787, 0.7905, 0.3651])"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = torch.rand(3) # 在建立变量的时候应该手动设置是否追踪求导\n",
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([0.5787, 0.7905, 0.3651], requires_grad=True)"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a.requires_grad_()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([0.3286, 0.4061, 0.2654], grad_fn=<SoftmaxBackward>)"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "p = F.softmax(a, dim=0)\n",
    "p"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "188px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
